{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 概述"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 李航《统计学习方法》（第一版）主要包括如下十大模型方法：\n",
    "    * 感知机，Perceptron\n",
    "    * K近邻，KNN，k-nearest neighbor\n",
    "    * 朴素贝叶斯，NB，naive bayes\n",
    "    * 决策树，DT，decision tree\n",
    "    * 逻辑斯谛回归，LR，logistic regression\n",
    "    * 支持向量机，SVM，support vector machines\n",
    "    * 提升方法，Boosting\n",
    "    * 期望最大化，EM，expectation maximization\n",
    "    * 隐马尔科夫，HMM，hidden markov\n",
    "    * 条件随机场，CRF，conditional random field\n",
    "    \n",
    "记忆口诀：感邻贝树逻，S提E马随\n",
    "\n",
    "形象解释：为了感谢邻居背着一箩筐小树苗，还有四个蹄子的一匹马跟在身后\n",
    "\n",
    "代码链接：https://github.com/fengdu78/lihang-code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* SKlearn 主要包含如下六大模块：\n",
    "    * 分类\n",
    "    * 回归\n",
    "    * 聚类\n",
    "    * 降维\n",
    "    * 模型选择\n",
    "    * 数据预处理\n",
    "    \n",
    "![](./sklearn.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* XGBoost 向来被称为处理 Kaggle 竞赛中非感知数据的刷榜神器，演化路线如下：\n",
    "\n",
    "![](./xgboost.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 环境\n",
    "* pip install scikit-learn\n",
    "* pip install xgboost\n",
    "\n",
    "本文主要基于最简单的分类数据集IRIS做简单演示，对比SVM和XGBoost的效果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. SVM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 数据及模型和评测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn import datasets  # 导入自带数据集\n",
    "from sklearn import svm  # 导入svm模块\n",
    "from sklearn.model_selection import train_test_split  # 数据集划分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(150, 4) (150,) (120, 4) (30, 4) (120,) (30,)\n"
     ]
    }
   ],
   "source": [
    "iris = datasets.load_iris()  # 数据的加载\n",
    "X, Y = iris.data, iris.target  # 输入及输出\n",
    "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)\n",
    "print(X.shape,\n",
    "      Y.shape,\n",
    "      X_train.shape,\n",
    "      X_test.shape,\n",
    "      Y_train.shape,\n",
    "      Y_test.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
       "  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n",
       "  max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
       "  tol=0.001, verbose=False)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = svm.SVC()  # 初始化模型\n",
    "model.fit(X_train, Y_train)  # 训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.93333333333333335"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.score(X_test, Y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 模型序列化及推理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['iris-svm.pkl']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.externals import joblib  # 环境\n",
    "\n",
    "joblib.dump(model, \"iris-svm.pkl\")  # 序列化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2] 2\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\App\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.\n",
      "  DeprecationWarning)\n"
     ]
    }
   ],
   "source": [
    "model_iris_svm = joblib.load(\"iris-svm.pkl\")  # 加载模型\n",
    "out_ = model_iris_svm.predict(X_test[0, :])\n",
    "real = Y_test[0]\n",
    "print(out_, real)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. XGBoost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参考1：https://www.kdnuggets.com/2017/03/simple-xgboost-tutorial-iris-dataset.html\n",
    "\n",
    "参考2：https://zhuanlan.zhihu.com/p/31182879"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1 简单入手"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import xgboost as xgb  # 环境"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 将 numpy 数据转化为 xgboost 的 DMatrix 数据\n",
    "d_train = xgb.DMatrix(X_train, label=Y_train)  # 训练集\n",
    "d_test = xgb.DMatrix(X_test, label=Y_test)  # 测试集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# XGBoost 参数设置：不同的数据集使用不同的参数会表现得更好\n",
    "params = {\n",
    "    'booster': 'gbtree',\n",
    "    'objective': 'multi:softmax',\n",
    "    'num_class': 3,\n",
    "    'gamma': 0.1,\n",
    "    'max_depth': 6,\n",
    "    'lambda': 2,\n",
    "    'subsample': 0.7,\n",
    "    'colsample_bytree': 0.7,\n",
    "    'min_child_weight': 3,\n",
    "    'silent': 1,\n",
    "    'eta': 0.1,\n",
    "    'seed': 1000,\n",
    "    'nthread': 4,\n",
    "}\n",
    "\n",
    "num_round = 500  # the number of training iterations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 模型训练\n",
    "model_iris_xgboost = xgb.train(param, d_train, num_round)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 模型结构\n",
    "model_iris_xgboost.dump_model(\"iris_xgboost.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  7.23272283e-03,   2.63873991e-02,   9.66379821e-01],\n",
       "       [  9.87908781e-01,   1.08116809e-02,   1.27950928e-03],\n",
       "       [  1.47451961e-03,   1.07322144e-03,   9.97452199e-01],\n",
       "       [  2.37509632e-03,   9.96875167e-01,   7.49728351e-04],\n",
       "       [  4.49852832e-03,   9.93392050e-01,   2.10937834e-03],\n",
       "       [  1.47450750e-03,   1.08149787e-03,   9.97444034e-01],\n",
       "       [  4.35259007e-03,   9.90142465e-01,   5.50486334e-03],\n",
       "       [  9.92838919e-01,   5.87525545e-03,   1.28589466e-03],\n",
       "       [  4.09123302e-03,   9.88222539e-01,   7.68627552e-03],\n",
       "       [  7.77279818e-03,   9.87108409e-01,   5.11880685e-03],\n",
       "       [  5.63382776e-03,   9.91724432e-01,   2.64172326e-03],\n",
       "       [  5.09332819e-03,   9.92518425e-01,   2.38828105e-03],\n",
       "       [  9.92493451e-01,   6.22116169e-03,   1.28544716e-03],\n",
       "       [  9.87908781e-01,   1.08116809e-02,   1.27950928e-03],\n",
       "       [  4.05580401e-02,   6.63167238e-01,   2.96274751e-01],\n",
       "       [  4.78960527e-03,   9.93698478e-01,   1.51189859e-03],\n",
       "       [  9.92838919e-01,   5.87525545e-03,   1.28589466e-03],\n",
       "       [  2.12180540e-02,   8.49925458e-01,   1.28856480e-01],\n",
       "       [  9.92838919e-01,   5.87525545e-03,   1.28589466e-03],\n",
       "       [  4.17817896e-03,   9.90537584e-01,   5.28427958e-03],\n",
       "       [  4.17817896e-03,   9.90537584e-01,   5.28427958e-03],\n",
       "       [  1.47451961e-03,   1.07322144e-03,   9.97452199e-01],\n",
       "       [  4.25838213e-03,   9.86985862e-01,   8.75576213e-03],\n",
       "       [  4.44163708e-03,   9.75944042e-01,   1.96142811e-02],\n",
       "       [  1.46313419e-03,   9.95122015e-01,   3.41480318e-03],\n",
       "       [  5.63382776e-03,   9.91724432e-01,   2.64172326e-03],\n",
       "       [  1.41233264e-03,   9.97961521e-01,   6.26133580e-04],\n",
       "       [  5.48171951e-03,   9.91947949e-01,   2.57039908e-03],\n",
       "       [  6.53205905e-03,   3.04865688e-02,   9.62981403e-01],\n",
       "       [  2.58926842e-02,   8.28525007e-01,   1.45582274e-01]], dtype=float32)"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 模型推理\n",
    "model_iris_xgboost.predict(d_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 实际上输出数据的每一列分别代表对应的类别号 0、1、2\n",
    "* 对每个数据点选择对应行概率最高那一列的索引作为最终的输出结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2 0 2 1 1 2 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 2 1 1 1 1 1 1 2 1]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "result = np.array([np.argmax(line) for line in model_iris_xgboost.predict(d_test)])\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEWCAYAAACOv5f1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHAdJREFUeJzt3X2cVHXd//HXmxsF2ZIIMQQRyUjBFVLyJokgWw1vSs0r\n87K8j7wK/flQ6tKuFCh7pKbd/koTbzA17Uca3pQYqWteFhYod96AmVuIN4CJsEABy+f3x5zFAXZh\nBvbMmeW8n4/HPDhz5syc9xx233Pme2bPKCIwM7N86ZB1ADMzqzyXv5lZDrn8zcxyyOVvZpZDLn8z\nsxxy+ZuZ5ZDL32wzkm6QdHnWOczSJH/O39qKpAZgT6CpaPbAiHh1Bx5zJHBHRPTdsXTtk6TJwCsR\n8Y2ss9jOxXv+1tZOiIiaost2F39bkNQpy/XvCEkds85gOy+Xv1WEpMMl/VHScklzkj365tvOlvS8\npJWS/ibpS8n8bsBDwF6SGpPLXpImS7qy6P4jJb1SdL1B0n9LmgusktQpud89kpZKelnShVvJuvHx\nmx9b0tckLZH0mqQTJR0raaGkf0r6etF9J0j6laRfJs/naUlDim4/QFJ9sh2elfSpzdZ7vaTfSloF\nnAucDnwtee4PJMtdKuml5PGfk3RS0WOcJel/JV0r6a3kuY4uur2HpFslvZrcPrXotuMlzU6y/VHS\nQSX/B1u74/K31EnqA/wGuBLoAYwD7pG0R7LIEuB44N3A2cD3JR0cEauA0cCr2/FO4jTgOKA7sAF4\nAJgD9AGOAi6SdEyJj/U+oEty3yuAScDngUOAjwKXS9q3aPlPA1OS5/oLYKqkzpI6Jzl+B/QCLgDu\nlPTBovv+J/Bt4F3Az4E7gWuS535CssxLyXp3ByYCd0jqXfQYhwELgJ7ANcDNkpTcdjuwGzA4yfB9\nAEkfAm4BvgS8F/gZcL+kXUvcRtbOuPytrU1N9hyXF+1Vfh74bUT8NiI2RMR0YCZwLEBE/CYiXoqC\nxymU40d3MMePImJRRKwBPgzsERHfjIi1EfE3CgX+uRIfax3w7YhYB9xNoVR/GBErI+JZ4DlgSNHy\nsyLiV8ny36PwwnF4cqkBrkpyPAo8SOGFqtl9EfFksp3+1VKYiJgSEa8my/wSeBE4tGiRv0fEpIho\nAm4DegN7Ji8Qo4HzI+KtiFiXbG+AMcDPIuKpiGiKiNuAfyeZbSfUbsdDrWqdGBG/32zePsB/SDqh\naF5n4DGAZFhiPDCQwg7JbsC8HcyxaLP17yVpedG8jsATJT7Wm0mRAqxJ/n2j6PY1FEp9i3VHxIZk\nSGqv5tsiYkPRsn+n8I6ipdwtknQGcDHQP5lVQ+EFqdnrRetfnez011B4J/LPiHirhYfdBzhT0gVF\n83Ypym07GZe/VcIi4PaI+OLmNyTDCvcAZ1DY612XvGNoHqZo6eNoqyi8QDR7XwvLFN9vEfByRHxg\ne8Jvh72bJyR1APoCzcNVe0vqUPQC0A9YWHTfzZ/vJtcl7UPhXctRwJ8ioknSbN7ZXluzCOghqXtE\nLG/htm9HxLdLeBzbCXjYxyrhDuAEScdI6iipS3IgtS+FvctdgaXA+uRdwNFF930DeK+k3YvmzQaO\nTQ5evg+4aBvr/zOwMjkI3DXJcKCkD7fZM9zUIZJOTj5pdBGF4ZMZwFPAagoHcDsnB71PoDCU1Jo3\ngAFF17tReEFYCoWD5cCBpYSKiNcoHED/qaT3JBlGJDdPAs6XdJgKukk6TtK7SnzO1s64/C11EbGI\nwkHQr1MorUXAV4EOEbESuBD4f8BbFA543l903xeAu4C/JccR9qJw0HIO0EDh+MAvt7H+JgoHlIcC\nLwPLgJsoHDBNw33AqRSezxeAk5Px9bUUyn50kuGnwBnJc2zNzcCg5mMoEfEccB3wJwovDLXAk2Vk\n+wKFYxgvUDjQfhFARMwEvgj83yT3X4Gzynhca2f8R15mbUjSBGC/iPh81lnMtsZ7/mZmOeTyNzPL\nIQ/7mJnlkPf8zcxyqGo/59+9e/fYb7/9so6xhVWrVtGtW7esY2yhWnNB9WZzrvI4V3myyjVr1qxl\nEbHHNheMiKq8DBw4MKrRY489lnWEFlVrrojqzeZc5XGu8mSVC5gZJXSsh33MzHLI5W9mlkMufzOz\nHHL5m5nlkMvfzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDLn8zsxxy+ZuZ5ZDL38wsh1z+\nZmY55PI3M8shl7+ZWQ65/M3Mcsjlb2aWQy5/M7MccvmbmeWQy9/MLIdc/mZmOeTyNzPLIZe/mVkO\nufzNzHLI5W9mlkMufzOzHHL5m5nlkMvfzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDLn8z\nsxxy+ZuZ5ZDL38wshxQRWWdoUb8B+0WHz/4w6xhbuKR2PdfN65R1jC1Uay6o3mzOVR7nKs/kT3Zj\n5MiRFV+vpFkRMWxby3nP38wsJf/617849NBDGTJkCIMHD2b8+PEAzJkzhyOOOILa2lpOOOEEVqxY\nAcDatWs5++yzqa2tZciQIdTX16eWLbXyl3ShpOcl3SnpR5L+KmmupIPTWqeZWTXZddddefTRR5kz\nZw6zZ89m2rRpzJgxg/POO4+rrrqKefPmcdJJJ/Hd734XgEmTJgEwb948pk+fziWXXMKGDRtSyZbm\nnv+XgTrgTuADyWUMcH2K6zQzqxqSqKmpAWDdunWsW7cOSSxcuJARI0YAUFdXxz333APAc889x8c/\n/nEAevXqRffu3Zk5c2Yq2VIpf0k3AAOAh4BfAz+PghlAd0m901ivmVm1aWpqYujQofTq1Yu6ujoO\nO+wwBg8ezH333QfAlClTWLRoEQBDhgzh/vvvZ/369bz88svMmjVr421tLbUDvpIagGHAZOCqiPjf\nZP4jwH9HxBYvZ5LGUHh3QM+eexxyxQ8mpZJtR+zZFd5Yk3WKLVVrLqjebM5VHucqz767d9y41w/Q\n2NjI5ZdfzoUXXkjHjh358Y9/zNtvv82RRx7Jvffey3333UdTUxM33HADzzzzDHvuuSdNTU0cf/zx\nDB8+vOT1jho1qqQDvlV1iDwibgRuhMKnfarxCH61frKgWnNB9WZzrvI4V3la+rTP008/zZtvvsm4\nceM444wzAFi4cCHPPvvsxmWPOuqojct/5CMf4eSTT2bQoEFtnq8Sn/ZZDOxddL1vMs/MbKe2dOlS\nli9fDsCaNWuYPn06+++/P0uWLAFgw4YNXHnllZx//vkArF69mlWrVgEwffp0OnXqlErxQ2X2/O8H\nxkq6GzgMeDsiXqvAes3MMvXaa69x5pln0tTUxIYNG/jsZz/L8ccfzw9/+EN+8pOfAHDyySdz9tln\nA7BkyRKOOeYYOnToQJ8+fbj99tvTCxcRqVyABqAnIOAnwEvAPGBYKfcfOHBgVKPHHnss6wgtqtZc\nEdWbzbnK41zlySoXMDNK6NjU9vwjon/R1a+ktR4zMyuf/8LXzCyHXP5mZjnk8jczyyGXv5lZDrn8\nzcxyyOVvZpZDLn8zsxxy+ZuZ5ZDL38wsh1z+ZmY55PI3M8shl7+ZWQ65/M3Mcsjlb2aWQy5/M7Mc\ncvmbmeWQy9/MLIdc/mZmOeTyNzPLIZe/mVkOufzNzHLI5W9mlkMufzOzHHL5m5nlkMvfzCyHXP5m\nZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDLn8zsxxy+ZuZ5ZDL38wshzplHaA1a9Y10f/S32QdYwuX\n1K7nrJ08V8NVxwFwzjnn8OCDD9KrVy/mz58PwFe/+lUeeOABdtllF97//vdz66230r179433/cc/\n/sGgQYOYMGEC48aNa5M8Ztb2Utvzl3ShpOclhaS5kuZJ+qOkIWmt09rWWWedxbRp0zaZV1dXx/z5\n85k7dy4DBw7kO9/5zia3X3zxxYwePbqSMc1sO6Q57PNloA44EvhYRNQC3wJuTHGd1oZGjBhBjx49\nNpl39NFH06lT4Q3j4YcfziuvvLLxtqlTp7LvvvsyePDgiuY0s/KlUv6SbgAGAA8Bh0XEW8lNM4C+\naazTKu+WW27ZuJff2NjI1Vdfzfjx4zNOZWalUESk88BSAzAsIpYVzRsH7B8R57VynzHAGICePfc4\n5IofTEol247Ysyu8sSbrFFtqy1y1fXbfOP36669z2WWXceutt26yzB133MGCBQv45je/iSSuv/56\n9t9/f0aNGsXkyZPp2rUrp556KlB4YaipqWmbcG3IucrjXOXJKteoUaNmRcSwbS1XsQO+kkYB5wLD\nW1smIm4kGRbqN2C/uG5e9R2PvqR2PTt7robTR74z3dBAt27dGDnynXmTJ0/m2Wef5ZFHHmG33XYD\n4PLLL+epp57itttuY/ny5XTo0IHBgwczduxY6uvrN7l/tXCu8jhXeao1V7OKtJikg4CbgNER8WYl\n1mnpmDZtGtdccw2PP/74xuIHeOKJJzZOT5gwgZqaGsaOHZtFRDMrQdlj/pLek5R5qcv3A+4FvhAR\nC8tdn2XntNNO44gjjmDBggX07duXm2++mbFjx7Jy5Urq6uoYOnQo559/ftYxzWw7lLTnL6ke+FSy\n/CxgiaQnI+LiEu5+BfBe4KeSANaXMh5l2bvrrru2mHfuuedu834TJkxIIY2ZtaVSh312j4gVks4D\nfh4R4yXN3dodIqJ/MnlecilL184dWZD8sVE1qa+v32RMvFpUay4zq06lDvt0ktQb+CzwYIp5zMys\nAkot/28CDwMvRcRfJA0AXkwvlpmZpamkYZ+ImAJMKbr+N+AzaYUyM7N0lbTnL2mgpEckzU+uHyTp\nG+lGMzOztJQ67DMJuAxYBxARc4HPpRXKzMzSVWr57xYRf95s3vq2DmNmZpVRavkvk/R+IAAknQK8\nlloqMzNLVamf8/8KhXPu7C9pMfAycHpqqczMLFXbLH9JHSicnfMTkroBHSJiZfrRzMwsLdsc9omI\nDcDXkulVLn4zs/av1DH/30saJ2lvST2aL6kmMzOz1JQ65n9q8u9XiuYFhW/rMjOzdqbUv/DdN+0g\nZmZWOaWe0vmMluZHxM/bNo6ZmVVCqcM+Hy6a7gIcBTwNuPzNzNqhUod9Lii+Lqk7cHcqiczMLHVl\nf41jYhXg4wBmZu1UqWP+D5Cc2oHCC8Ygik7xbGZm7UupY/7XFk2vB/4eEa+kkMfMzCqg1GGfYyPi\n8eTyZES8IunqVJOZmVlqSi3/uhbmjW7LIGZmVjlbHfaR9F/Al4EBkuYW3fQu4Mk0g5mZWXq2Neb/\nC+Ah4DvApUXzV0bEP1NLZWZmqdpq+UfE28DbwGkAknpR+COvGkk1EfGP9COamVlbK/UL3E+Q9CKF\nL3F5HGig8I7AzMzaoVIP+F4JHA4sTE7ydhQwI7VUZmaWqlLLf11EvAl0kNQhIh4DhqWYy8zMUlTq\nH3ktl1QDPAHcKWkJhVM8mJlZO1Tqnv+ngdXARcA04CXghLRCmZlZuko9q+cqSfsAH4iI2yTtBnRM\nN5qZmaWl1E/7fBH4FfCzZFYfYGpaoczMLF2lDvt8BTgSWAEQES8CvdIKZWZm6Sq1/P8dEWubr0jq\nxDuneDYzs3am1E/7PC7p60BXSXUUzvfzQHqxYM26Jvpf+ps0V7FdLqldz1ntNFfDVcdVKI2ZVbtS\n9/wvBZYC84AvAb8FvpFWKEvXOeecQ69evTjwwAM3zpsyZQqDBw+mQ4cOzJw5c+P8N998k1GjRlFT\nU8PYsWOziGtmKdhq+UvqBxARGyJiUkT8R0SckkxvddhH0oWSnpf0lqS5kmZLmilpeFs+ASvfWWed\nxbRp0zaZd+CBB3LvvfcyYsSITeZ36dKFb33rW1x77bWY2c5jW3v+Gz/RI+meMh/7yxS+B2BvYEhE\nDAXOAW4q83GsjY0YMYIePXpsMu+AAw7ggx/84BbLduvWjeHDh9OlS5dKxTOzCthW+atoekCpDyrp\nhmT5h4AvFr1L6IYPFJuZZW5bB3yjlemt3ynifEmfBEZFxDJJJ1H4ToBeQKtHHSWNAcYA9Oy5B1fU\nri91lRWzZ9fCwdVqU0qu+vr6jdOvv/46q1at2mQewPLly5k1axaNjY2bzH/hhRdYvHjxFsuXorGx\ncbvulzbnKo9zladaczXbVvkPkbSCwjuArsk0yfWIiHeXspKI+DXwa0kjgG8Bn2hluRuBGwH6Ddgv\nrptX6oeRKueS2vW011wNp498Z7qhgW7dujFy5MhNlunevTuHHHIIw4Ztet6+hoYGGhsbt1i+FPX1\n9dt1v7Q5V3mcqzzVmqvZtr7MpU1P4RARf5A0QFLPiFjWlo9tZmalS30XVtJ+wEsREZIOBnYF3kx7\nvda60047jfr6epYtW0bfvn2ZOHEiPXr04IILLmDp0qUcd9xxDB06lIcffhiA/v37s2LFCtauXcvU\nqVP53e9+x6BBgzJ+Fma2IyoxfvEZ4AxJ64A1wKnb+pgoQNfOHVlQhX+UVF9fv8nwSbUoJ9ddd93V\n4vyTTjqpxfkNDQ3bmcrMqlVq5R8R/ZPJq5OLmZlViVL/wtfMzHYiLn8zsxxy+ZuZ5ZDL38wsh1z+\nZmY55PI3M8shl7+ZWQ65/M3Mcsjlb2aWQy5/M7MccvmbmeWQy9/MLIdc/mZmOeTyNzPLIZe/mVkO\nufzNzHLI5W9mlkMufzOzHHL5m5nlkMvfzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDLn8z\nsxxy+ZuZ5ZDL38wsh1z+ZmY55PI3M8shl7+ZWQ65/M3Mcsjlb2aWQ52yDtCaNeua6H/pb7aY33DV\ncRunv//973PTTTchidraWm699Va6dOlSyZhmZu1Sanv+ki6U9LykeyT9SdK/JY1rq8dfvHgxP/rR\nj5g5cybz58+nqamJu+++u60e3sxsp5bmnv+XgU8Aa4F9gBPbegXr169nzZo1dO7cmdWrV7PXXnu1\n9SrMzHZKqez5S7oBGAA8BJweEX8B1rXlOvr06cO4cePo168fvXv3Zvfdd+foo49uy1WYme20FBHp\nPLDUAAyLiGXJ9QlAY0Rcu5X7jAHGAPTsucchV/xg0hbL1PbZHYCVK1cyfvx4rrjiCmpqapgwYQIf\n+9jHqKura/PnUqyxsZGamppU17E9qjUXVG825yqPc5Unq1yjRo2aFRHDtrVcVR3wjYgbgRsB+g3Y\nL66bt2W8htNHAjBlyhQ+9KEPceKJhdGkV199lRkzZjBy5MhUM9bX16e+ju1RrbmgerM5V3mcqzzV\nmqtZu/2oZ79+/ZgxYwarV68mInjkkUc44IADso5lZtYutNvyP+ywwzjllFM4+OCDqa2tZcOGDYwZ\nMybrWGZm7ULqwz6S3gfMBN4NbJB0ETAoIlbs6GNPnDiRiRMn7ujDmJnlTmrlHxH9i672Lff+XTt3\nZEHRH3SZmVnbabfDPmZmtv1c/mZmOeTyNzPLIZe/mVkOufzNzHLI5W9mlkMufzOzHHL5m5nlkMvf\nzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDLn8zsxxy+ZuZ5ZDL38wsh1z+ZmY55PI3M8sh\nl7+ZWQ65/M3Mcsjlb2aWQy5/M7MccvmbmeWQy9/MLIdc/mZmOeTyNzPLIZe/mVkOufzNzHLI5W9m\nlkMufzOzHHL5m5nlkMvfzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVvZpZDioisM7RI0kpgQdY5\nWtATWJZ1iBZUay6o3mzOVR7nKk9WufaJiD22tVCnSiTZTgsiYljWITYnaaZzladaszlXeZyrPNWa\nq5mHfczMcsjlb2aWQ9Vc/jdmHaAVzlW+as3mXOVxrvJUay6gig/4mplZeqp5z9/MzFLi8jczy6Gq\nLH9Jn5S0QNJfJV2acZYGSfMkzZY0M5nXQ9J0SS8m/76nAjlukbRE0vyiea3mkHRZsv0WSDqmwrkm\nSFqcbLPZko7NINfekh6T9JykZyX9n2R+pttsK7ky3WaSukj6s6Q5Sa6Jyfyst1druTL/GUvW1VHS\nM5IeTK5n/jtZsoioqgvQEXgJGADsAswBBmWYpwHoudm8a4BLk+lLgasrkGMEcDAwf1s5gEHJdtsV\n2DfZnh0rmGsCMK6FZSuZqzdwcDL9LmBhsv5Mt9lWcmW6zQABNcl0Z+Ap4PAq2F6t5cr8ZyxZ38XA\nL4AHk+uZ/06WeqnGPf9Dgb9GxN8iYi1wN/DpjDNt7tPAbcn0bcCJaa8wIv4A/LPEHJ8G7o6If0fE\ny8BfKWzXSuVqTSVzvRYRTyfTK4HngT5kvM22kqs1lcoVEdGYXO2cXILst1druVpTsZ8xSX2B44Cb\nNlt/pr+TparG8u8DLCq6/gpb/+VIWwC/lzRL0phk3p4R8Voy/TqwZzbRWs1RDdvwAklzk2Gh5re+\nmeSS1B/4EIW9xqrZZpvlgoy3WTKEMRtYAkyPiKrYXq3kgux/xn4AfA3YUDQv8+1Vqmos/2ozPCKG\nAqOBr0gaUXxjFN7TZf552WrJkbiewrDdUOA14LqsgkiqAe4BLoqIFcW3ZbnNWsiV+TaLiKbkZ70v\ncKikAze7PZPt1UquTLeXpOOBJRExq7Vlqux3cgvVWP6Lgb2LrvdN5mUiIhYn/y4Bfk3hrdobknoD\nJP8uySheazky3YYR8UbyC7sBmMQ7b28rmktSZwoFe2dE3JvMznybtZSrWrZZkmU58BjwSapge7WU\nqwq215HApyQ1UBia/rikO6ii7bUt1Vj+fwE+IGlfSbsAnwPuzyKIpG6S3tU8DRwNzE/ynJksdiZw\nXxb5tpLjfuBzknaVtC/wAeDPlQrV/MOfOInCNqtoLkkCbgaej4jvFd2U6TZrLVfW20zSHpK6J9Nd\ngTrgBbLfXi3mynp7RcRlEdE3IvpT6KhHI+LzVOnvZIuyPNrc2gU4lsKnIF4C/ifDHAMoHKGfAzzb\nnAV4L/AI8CLwe6BHBbLcReHt7ToK44Xnbi0H8D/J9lsAjK5wrtuBecBcCj/0vTPINZzCW+65wOzk\ncmzW22wruTLdZsBBwDPJ+ucDV2zrZz3jXJn/jBWtbyTvfNon89/JUi8+vYOZWQ5V47CPmZmlzOVv\nZpZDLn8zsxxy+ZuZ5ZDL38wsh6r5C9zNUiGpicLHBJudGBENGcUxy4Q/6mm5I6kxImoquL5OEbG+\nUuszK4WHfcw2I6m3pD8k54mfL+mjyfxPSno6Obf8I8m8HpKmJicYmyHpoGT+BEm3S3oSuD05Odl3\nJf0lWfZLGT5FMw/7WC51Tc4SCfByRJy02e3/CTwcEd+W1BHYTdIeFM4hMyIiXpbUI1l2IvBMRJwo\n6ePAzymcbAwK53AfHhFrkjPCvh0RH5a0K/CkpN9F4fS+ZhXn8rc8WhOFs0S25i/ALckJ2KZGxGxJ\nI4E/NJd1RDR/h8Fw4DPJvEclvVfSu5Pb7o+INcn00cBBkk5Jru9O4fwuLn/LhMvfbDMR8Yfk1N3H\nAZMlfQ94azsealXRtIALIuLhtshotqM85m+2GUn7AG9ExCQK39J0MDADGJGckZGiYZ8ngNOTeSOB\nZbHZ9wYkHgb+K3k3gaSByZlizTLhPX+zLY0EvippHdAInBERS5Nx+3sldaBwnvY6Ct8le4ukucBq\n3jmd7+ZuAvoDTyendV5KBb7+06w1/qinmVkOedjHzCyHXP5mZjnk8jczyyGXv5lZDrn8zcxyyOVv\nZpZDLn8zsxz6/1DstSrijRS3AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1df71259400>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 显示重要特征\n",
    "import matplotlib.pyplot as plt\n",
    "from xgboost import plot_importance\n",
    "plot_importance(model_iris_xgboost)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 存储推理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.91228070175438603"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 测评结果\n",
    "from sklearn.metrics import precision_score\n",
    "\n",
    "precision_score(Y_test, result, average=\"macro\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['iris_xgboost.pkl']"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 序列化\n",
    "joblib.dump(model_iris_xgboost, \"iris_xgboost.pkl\", compress=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4,) (1, 4)\n",
      "2 2\n"
     ]
    }
   ],
   "source": [
    "# 加载\n",
    "model_iris_xgboost = joblib.load(\"iris_xgboost.pkl\")\n",
    "# 升维\n",
    "x = np.expand_dims(X_test[0, :], axis=0)\n",
    "print(X_test[0, :].shape, x.shape)\n",
    "x = xgb.DMatrix(x)\n",
    "# 推理\n",
    "out_ = np.argmax(model_iris_xgboost.predict(x))\n",
    "# 结果           \n",
    "real = Y_test[0]\n",
    "print(out_, real)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
